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The Allosteric Switching Mechanism in Bacteriophage MS2 
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In this article we use all-atom simulations to elucidate the mechanisms underlying conformational switching 
and allostery within the coat protein of the bacteriophage MS2. Assembly of most icosahedral virus capsids 
requires that the capsid protein adopt different conformations at precise locations within the capsid. It has 
been shown that a 19 nucleotide stem loop (TR) from the MS2 genome acts as an allosteric effector, guiding 
conformational switching of the coat protein during capsid assembly. Since the principal conformational 
changes occur far from the TR binding site, it is important to understand the molecular mechanism underlying 
this allosteric communication. To this end, we use all-atom simulations with explicit water combined with 
a path sampling technique to sample the MS2 coat protein conformational transition, in the presence and 
absence of TR-binding. The calculations find that TR binding strongly alters the transition free energy profile, 
leading to a switch in the favored conformation. We discuss changes in molecular interactions responsible 
for this shift. We then identify networks of amino acids with correlated motions to reveal the mechanism 
by which effects of TR binding span the protein. The analysis predicts amino acids whose substitution by 
mutagenesis could alter populations of the conformational substates or their transition rates. 


I. INTRODUCTION 

The controlled interconversion between protein confor¬ 
mational states is crucial for essential cellular functions, 
including signaling, metabolism, and assembly of the dy¬ 
namic cytoskeleton. A key regulatory role in such pro¬ 
cesses is often played by allosteric effectors, whose bind¬ 
ing favors a particular protein conformation. The tran¬ 
sition pathways by which proteins interconvert between 
these folded states are largely unknown because inter¬ 
mediates along the pathways cannot be directly charac¬ 
terized by experiments. Similarly, it remains poorly un¬ 
derstood how perturbations due to effector binding are 
communicated across the protein to alter its conforma¬ 
tional free energy landscape. In this article we combine 
long unbiased all-atom molecular dynamics (MD) simula¬ 
tions, an efficient pathway sampling algorithm called the 
string method 1 ' 9 , and analysis of inter-residue correla¬ 
tions to characterize a protein conformational transition 
pathway and how it is affected by effector binding. In 
particular, we study the conformational transition of the 
MS2 coat protein dimer, and how the binding of an RNA 
stem loop from the MS2 genome acts as a molecular con¬ 
formational switch that guides protein assembly into an 
icosahedrally symmetric capsid. 

MS2 is a small bacteriophage that infects male E. Coli. 
During virus assembly, 180 copies of the coat protein 
(CP) spontaneously assemble around a 3,569 nucleotide 
single-stranded RNA genome to form an icosahedral cap¬ 
sid. The capsid is a T=3 structure, meaning that the CPs 
adopt three conformations (termed A,B,C) which are pre¬ 
cisely arranged within the capsid 10 . Major structural dif¬ 
ferences among the protein conformations are confined to 
the FG loop, which in the A and C conformations forms 
an anti-parallel /Thairpin, but in the B conformation is a 
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flexible loop pulled back against the dimer with a small 
rt-helix kink. The A and C monomers are thus nearly 
identical, and their FG loops meet at 20 3-fold (quasi-6- 
fold) axes, whereas the FG loops of the B monomers meet 
at the 12 5-fold interfaces. In solution, the monomers 
form stable, non-covalent dimers, which are the basic as¬ 
sembly subunits and will be denoted as CP 2 (Fig. 1 b,c). 
Formation of the capsid thus requires that 30 CC and 
60 AB dimers associate and arrange themselves into the 
icosahedral geometry (Fig. 1 a). 

Based on structural studies, in vitro assembly assays, 
and modeling, it has been proposed that allosteric in¬ 
teractions between CP 2 and the viral genome guide con¬ 
formational selection during MS2 assembly 12 ' 15 . Capsid 
assembly can be triggered in vitro by the addition of a 
19-nucleotide RNA stem-loop (TR) fragment from the 
genome. TR encompasses the start codon for the repli- 
case protein, and has been shown to bind strongly to 
the bottom of CP 2 16,17 . In the crystal structure, TR is 
bound to the CC dimers in two symmetric orientations, 
while steric constraints allow only a single orientation for 
the AB dimer (Fig. 1 g,h). 

In vitro experiments by Stockley and coworkers 12 on 
wild-type CP showed that, in the absence of genomic 
RNA, CP assembles slowly and produces only a low 
yield of capsids. Adding a molar ratio of TR results 
in a strongly bonded CP 2 :TR complex that is kinetically 
trapped. However, adding an equal molar ratio of CP 2 to 
CP 2 :TR results in rapid and efficient assembly. Further¬ 
more, NMR studies on an assembly-incompetent mutant 
MS2 coat protein (Trp82Arg), showed that TR binding 
induces a conformation change from a symmetric dimer 
(presumably BB-like) to an asymmetric dimer (presum¬ 
ably AB-like). 

Based on these observations, it was proposed that dur¬ 
ing assembly of wild-type (WT) MS2 capsid proteins, TR 
binding acts as a molecular switch which favors a confor¬ 
mational change from the symmetric CC dimer to the 
asymmetric AB dimer 12 . Since both AB and CC dimers 
are needed for efficient assembly, this scenario is consis- 
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FIG. L MS2 Capsid geometry and subunit structure, (a) The complete T=3 MS2 capsid of 27.5nm diameter is comprised of 
30 CC and 60 AB dimers. It has icosahedral symmetry with the 5-fold vertices as AB dimers and the 3-fold vertices as 3 AB + 
3 CC dimers, (pdb ID: 1BMS) (b)-(c) the AB and CC dimers colored according to their conformation. The B conformation 
differs significantly from the A and C conformations in the FG loop, (d)-(e) A close up view of the FG loop with a selection of 
side chains shown as bonds. The B conformation lacks the hydrogren bonds found in the A and C conformations (and shown in 
yellow), (f) The nucleic acid sequence of the TR stem loop, which binds with high affinity to the base of the MS2 dimer. The 
sequence positions of the adenines that bind most strongly are labeled in red (-10 and -4). (g)-(h) MS2 AB and CC dimers 
shown with the RNA stem loop (TR) bound to their base (pdb ID: 2BU1 11 ). The RNA can adopt two symmetric positions 
for the CC dimer (only one shown), but the AB dimer allows only one position due to steric collisions. The RNA is shown as 
grey VDW spheres. 


tent with the observation that pure solutions of either 
CP 2 (assumed to be CC) or CP 2 :TR (assumed to be 
AB:TR) are kinetically trapped whereas an equal molar 
ratio of CP 2 to CP 2 iTR results in rapid and efficient as¬ 
sembly. Subsequent theoretical models suggest that such 
a conformational switch is consistent with existing struc¬ 
tural data and assembly kinetics 13-15,18 . 

Since TR binds CP 2 17 (Fig. 1 g,h) about 12 A from 
the FG loop where the conformation change is localized, 
there is great interest in understanding the molecular 
mechanism underlying the apparent allosteric communi¬ 
cation between these two regions of the protein. Using 
all-atom normal mode analysis, Dykeman et al. 13 found 
that TR binding to an initially symmetric CC conforma¬ 
tion leads to asymmetries consistent with the AB con¬ 
formation. Namely, fluctuations of residues near the FG 
loop on the A* chain (meaning the chain that corresponds 
to the A chain in the AB dimer conformation) are sup¬ 
pressed, whereas those near the B* FG loop increase. 

The goal of this paper is to directly calculate the MS2 
capsid protein conformational free energy landscape, to 
learn how it is altered by the binding of the genome 
fragment TR, and to elucidate the molecular basis by 
which perturbations caused by TR binding are commu¬ 
nicated across the protein. To this end, we employ the 
string method 1 5 to identify and characterize the most 
probable transition pathways and associated free energy 
profiles for the conformational transition in the presence 
and absence of TR, using all-atom simulations with ex¬ 
plicit water. Furthermore, to directly probe the molec¬ 
ular basis for allosteric communication, we characterize 
correlations of amino acid conformational statistics and 
motions within long, unbiased MD simulation trajecto¬ 
ries. These combined calculations demonstrate that the 
conformational transition is a complex, multi-step pro¬ 


cess with multiple metastable minima, and is stabilized 
by multiple molecular-scale interactions whose statistics 
can be altered by molecular binding in disparate regions 
of the protein. The analysis predicts several amino acids 
whose substitution by mutagenesis could alter popula¬ 
tions of the conformational substates or their transition 
rates. These findings may shed light on the mechanisms 
by which molecular binding affects conformational free 
energy landscapes in a wide variety of proteins, as well 
as for understanding the diverse roles of RNA in viral 
assembly. 

Previous computational works have used enhanced 
sampling methods to examine the effect of small molecule 
substrates on protein interconversion pathways and free 
energies, with a particular focus on the enzyme adeny¬ 
late kinase 19 23 . Most closely related to our work, Pat¬ 
tis and May investigated the effect of RNA binding on 
the Lassa Virus nucleoprotein conformational free energy 
landscape 24 . 

This article is arranged as follows. In section II we de¬ 
scribe the model, simulations, and methodologies used 
to sample the transition. In section III, we describe 
the transition pathways predicted by the string method 
in the presence and absence of TR, we highlight some 
residues found to play key roles in stabilizing the tran¬ 
sition based on the converged strings, and we present 
results of mutual information on correlations between 
amino acid conformations and motions. Finally, in sec¬ 
tion IV we discuss implications of these results for under¬ 
standing the mechanism underlying the conformational 
transition and how it is influenced by TR binding. Ad¬ 
ditional methodological details and validations are given 
in the appendices. 
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II. METHODS 

A. Systems and simulations 

Systems. For statistical analysis and for generating 
beginning and end points for string method calculations, 
we initialized unbiased MD simulations from two MS2 
capsid protein dimer conformations, each in the presence 
and absence of the RNA stem loop TR. We denote the 
four systems as AB, CC, AB:TR, and CC:TR. To avoid 
complications associated with the fact that P78 under¬ 
goes a cis to trans switch between conformations, we 
studied P78N mutants, which assemble complete capsids 
but are not infectious 25 . The AB and CC dimer struc¬ 
tures were therefore extracted from a crystal structure 
of the empty P78N capsid (pdb ID 1BMS 26 ). Since no 
crystal structure for P78N capsids with TR is available, 
we extracted AB:TR and CC:TR from a wild type MS2 
capsid containing TR (pdb ID 2BU1), and performed the 
P78N mutation in silico using VMD 27 . The first and last 
bases in the RNA stem loop (A and U) for 2BU1 are 
missing, and were added using CHARMM 28 . 

Each of the four dimer structures was solvated with at 
least lnm of water on each side of the structure. The 
resulting simulation boxes were approximately 10.2nm x 
7.7nm x 5.6nm for CP 2 and 10.6nm x 7.2nm x 7.5nm for 
CP 2 :TR. We ensured that each pair of systems intended 
to serve as beginning and end points of the same string 
(AB, CC) and (AB:TR, CC:TR) had the same number 
of atoms. Water molecules were replaced at random with 
Na + and Cl - ions to neutralize the charge and to bring 
the ionic strength to 0.1M. The total system size was ap¬ 
proximately 41,000 atoms for CP 2 and 58,000 atoms for 
CP 2 :TR. During equilibration, an orientation restraint 
was added to keep the dimer from self-interaction across 
the periodic boundary. For long unbiased MD calcu¬ 
lations, larger water boxes (approximately 10.2nm 3 for 
both CP 2 and CP 2 :TR) were used with no orientation 
restraints. Details about the equilibration protocol are 
given in Appendix A 1. 

Simulations. Simulations were performed with ver¬ 
sion 4.5.5 of Gromacs 29 modified with version 1.3.0 of 
the plugin PLUMED 30 , which was used to generate all 
restraints and monitor collective variables. Version 5.0.5 
of Gromacs was used for the long unbiased simulations. 
The CHARMM36 all-atom forcefield 31 was used to rep¬ 
resent the system and the TIP3P model 32,33 was used 
for water molecules (The string simulations used the 
CHARMM22/CMAP forcefield 34,35 for proteins as they 
were partly performed before the CHARMM36 force- 
field was available in Gromacs). Bond lengths were con¬ 
strained using the LINCS algorithm 36 with order 4. The 
NPT ensemble was simulated using velocity rescaling 
for the temperature coupling and the Parrinello-Rahman 
barostat for pressure coupling 37,38 . Electrostatic inter¬ 
actions were calculated using the particle-mesh Ewald 
(PME) algorithm 39 , with a grid spacing of 0.12 and real- 
space interactions cut off at 1.2nm. Van der Waals inter¬ 


actions were switched at l.Onm and cut off at 1.2nm. 


B. The String Method Algorithm 

To determine the minimum free energy transition path¬ 
ways (MFTP) for the AB^CC and AB:TR^CC:TR 
conformations, we used the string method algorithm 
in collective variables, which was first presented by 
Maragliano et al 5 . While a number of powerful meth¬ 
ods have been developed to sample transition pathways 
and other rare events (e.g., 6 ’ 7,40-59 ), the string method 
provides a means to discover the MFTP in a space of 
many collective variables (CVs), with a computational 
expenditure that is nearly independent of the number of 
CVs. To obtain a meaningful free energy minimum, the 
collective variables must include all slow degrees of free¬ 
dom relevant to the transition. Our collective variables 
were chosen to be a subset of the atomic positions 6,8,60,61 . 

The method can be summarized as follows. A set 
of collective variables capable of characterizing all rel¬ 
evant slow degrees of freedom in the system is chosen 
(section IIC). An initial pathway connecting the two 
stable states is discretized as an ordered sequence of 
states (called images) and represented as a curve (called a 
string) in the multidimensional collective variable space. 
An iterative calculation is then performed to relax the 
initial pathway toward a minimum free energy pathway: 
(1) For each image, multiple short MD simulations are 
performed in which sampling is constrained to the vicin¬ 
ity of that image in collective variable space by a har¬ 
monic bias potential. The gradient of the free energy 
in collective variable space at each image is calculated 
from the average force imposed by the bias potential. (2) 
The position of each image in collective variable space is 
incremented by displacing it along the (negative) direc¬ 
tion of the free energy gradient. (3) Images are redis¬ 
tributed to maintain uniform spacing in arc length along 
the string. Steps (1-3) are repeated until the string con¬ 
verges to within desired precision. The free energy profile 
along the converged string can then be calculated using 
umbrella sampling (section IID). 

Details about these procedures, the chosen set of co¬ 
ordinates, and assessments of convergence are given in 
Appendix B. 


C. Selecting Collective Variables 

A string is defined by a set of collective variables (CVs), 
which must include all slow degrees of freedom that are 
relevant to the reaction. It is not known a priori which 
CVs constitute a good reaction coordinate. While in 
principle it is possible to choose a large number of CVs 
with the expectation that a subset of them will constitute 
a good reaction coordinate, extraneous or redundant CVs 
can introduce noise that slows convergence. Thus, our 
goal was to select the minimal possible CV set sufficient 
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to describe both the CP 2 and CP 2 iTR conformational 
transitions. 

Based on extensive trial calculations using various 
types of CVs (including distances, positions, and dihedral 
angles), we found Cartesian positions of individual atoms 
to be best suited for the study of MS2. Atomic positions 
have been successfully used in previous studies 6,8,60,61 
and can capture both the native CC backbone hydrogen 
bonds breaking/forming and the formation of the a-kink 
in the AB state. 

Because absolute positions are not invariant under 
rigid body motions, we restricted translational and ro¬ 
tational diffusion by including position restraints on 10 
C a atoms in the top helices of each monomer in CP 2 
(residues 105-109). These residues are far from the RNA 
binding site and FG loop (where the conformational 
change is localized). In an alternate approach, Ovchin¬ 
nikov et al. performed principle component analysis on 
the rigid core of the protein to define a body-centered 
coordinate system 6 . Another approach is to perform on 
the fly structural alignment 62 . 

We followed the approach of Ref. 6 to select the set of 
atoms whose positions comprise the CVs. We ran a series 
of targeted molecular dynamics simulations (TMDs) 63 , in 
which external biasing forces were applied to the candi¬ 
date atoms to force the system between conformations. 
Each candidate set of atoms was ranked by the difference 
in backbone dihedral angles between the final structure 
and the target, and the amount of RMSD drift observed 
during 4ns of simulation after all restraints were released. 
This test was performed on TMD simulations in both di¬ 
rections (AB to CC and CC to AB) for both CP 2 and 
CP 2 :TR. Once a set of CVs was chosen in this manner, 
redundant or extraneous atoms were eliminated through 
a trial and error process in which candidate removals were 
tested by additional TMD simulations. The final set of 
CVs contains the positions for 40 atoms, listed in Fig 2. 


D. Free Energy Along String Pathway 


To calculate the free energy profiles from converged 
strings, we performed umbrella sampling on an order pa¬ 
rameter s that gives the position along the string path. 
Our implementation is based on the approach described 
in Ref 64 . To ensure that sampling does not meander 
arbitrarily far in directions transverse to the transition 
tube defined by the string 8 , we also defined an order pa¬ 
rameter z, which measures the distance from the string. 
The definition of s and 2 are inspired by the path collec¬ 
tive variables in PLUMED 30 ; for an arc length between 
images i and i + 1, s and 2 are given by 


s = i + 


z 2 = 


(y - gO • (gi +1 - g0 

Oi+i- 0i\ 2 

y - 0i' 2 


@i+l ~ Oi 


- {s-iy 


(i) 



String CV List: K66:CA, K66:0, V67:CA, A68:N, A68:CA, A68:0, T69:CA, Q70:N, 
Q70:CA, Q70:CB, Q70:O, T71:CA, T71:CB, V72:N, V72:CA, V72:0, G73:CA, 

G74:CA, V75:N, V75:CA, V75:0, E76:CA, E76:CG, L77:N, L77:CA, L77:CG, L77:0, 
N78:N, N78:CA, N78:CB, V79:N, V79:CA, V79:0, A80:CA, A81:N, A81:CA, W82:CA, 
W82:CB, W82:NE1, W82:CH2 

Orientation Restraint Atoms (on both chains): V105:CA, K106:CA, A107:CA, 
M108:CA, Q109:CA 

FIG. 2. The atoms whose positions were selected as CVs 
for the string are shown for both the AB and CC FG loops. 
The backbone and a selection of the side chains are shown 
as bonds, and the string atoms are shown as purple spheres. 
The string atoms are listed in the text below the figure along 
with those atoms whose positions were used to prevent trans¬ 
lational and rotational diffusion. 


where Qi gives the vector of CV coordinates defined by 
image i, y is the dynamic vector of CV coordinates dur¬ 
ing sampling, and s is the projection of y onto the line 
segment between the bounding images (#i+i — 0$), scaled 
by the image separation. 

During umbrella sampling approximately 150 window 
centers were spaced evenly in s, with a spring constant of 
k ~ 350kJ. To maintain sampling near the center of the 
transition tube 8 , a half-harmonic, upper wall potential 
was placed between z = 2 and z = 3, with spring constant 
«wall ~ 450kJ. 

To check for hysteresis, each window was seeded us¬ 
ing steered MD simulations from two different starting 
structures, one from each of the upper and lower bound¬ 
ing images. The two seeds for each of the ^150 windows 
were then each sampled for 200ps, so the total simulation 
time for each free energy calculation was 60 ns. The free 
energy was calculated from this data using Alan Gross- 
field’s implementation of WHAM (Weighted Histogram 
Analysis Method) 65 . 


E. Root of Mean Squared Fluctuations (RMSF) 

The root of mean squared fluctuations (RMSF) for 
each amino acid about an average structure were cal¬ 
culated for each of the CP 2 and CP 2 :TR systems. First, 
a single 300ns unbiased trajectory was run for each of 
the four systems and seed structures were extracted as 
starting points for new trajectories. For the CC, CC:TR 
and AB:TR system each of the 7 seeded trajectories was 
450ns long of which the last 368ns was used in the calcu¬ 
lation, resulting in 2.576/rs of sampling for each system. 
For the AB system there were 6 seeded trajectories of 
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530ns length each, from which the last 430ns was used 
in the calculation, resulting in 2.58/rs of sampling. Con¬ 
figurations were outputted every 25ps for all the seeded 
unbiased MD trajectories. The structures from the tra¬ 
jectories were first aligned to minimize the mass-weighted 
RMSD of the C a atoms that comprise the core of the pro¬ 
tein (residues 7-63 and 85-124 of each monomer). Using 
the aligned structures, the RMSF was calculated with re¬ 
spect to the average structure and then averaged over all 
non-hydrogen atoms in each amino acid. 

F. Mutual Information 

We calculated the mutual information (MI) between 
all pairs of amino acids for all CP 2 and CP 2 :TR systems 
using the approach and Mutlnf program developed by 
McClendon et al 66 . In this approach, the MI is calcu¬ 
lated using second order terms from the configurational 
entropy expansion, and indicates the correlation between 
backbone and side chain conformations 66 . It is calculated 
using internal coordinates (i.e. the <p and ip backbone di¬ 
hedrals and side chain rotamers). Amino acids that have 
shared mutual information have correlated dihedral dis¬ 
tributions. Correlated distributions can arise through di¬ 
rect interaction, a chain of interactions, backbone move¬ 
ments, solvent rearrangement, or other mechanisms. 

For each system, we applied the Mutlnf program to 
the microseconds of multiple trajectories we used in 
the RMSF calculation. We used 24 bins per degree-of- 
freedom, and results from 20 sets of scrambled data were 
calculated and subtracted in order to determine the ex¬ 
cess mutual information, as done by McClendon et al 67 . 
We then used hierarchical clustering on the resulting MI 
matrix to identify groups of amino acids that share signif¬ 
icant mutual information (as done in Ref 66 ). We gener¬ 
ated the dissimilarity matrix as given in Eq. 2, and used 
a Euclidean distance metric to cluster amino acids. 

Dij = Max(MI) - MUj (2) 

We systematically extracted the largest possible “real” 
clusters by recursively splitting the hierarchy of clusters 
until each cluster achieved an MI average greater than 
a given cutoff value. After generating the clusters, we 
verified that they were valid (i.e. had high intra-cluster 
MI averages and small inter-cluster MI averages) using 

cluster av g = ^ E E ( 3 ) 

where W m is the set of amino acid numbers that belong 
to cluster m. The sums loop over all residues in W m and 
W nj and N is the total number of elements in the sum 
such that i ^ j. For an intra-cluster average (m = n), 
all amino acid self-correlations are ignored (i.e. i ^ j). 

From the MI network we calculated the node between¬ 
ness centrality for each residue. The node betweenness 


centrality determines the number of shortest distance 
paths that go through each node of the network and is 
an indicator of the importance of that node in the com¬ 
munication of the network 68 . We used the tnet package 
available through the statistical software R for the calcu¬ 
lation of the betweenness centralities 69 . 


III. RESULTS 

A. Conformational Transition Pathway 

In this section we compare the calculated most proba¬ 
ble conformational transition pathways for the CCv^AB 
and CC:TR^AB:TR MS2 coat protein dimer intercon¬ 
versions. The free energy profiles calculated from the 
converged strings and illustrative snapshots from the con¬ 
verged strings are shown in Fig. 3. Details on these calcu¬ 
lations can be found in Sections II:A,B and Appendix B. 
Furthermore, several tests of convergence are discussed in 
Appendix B 3. Most significantly, an independent string 
started from a different initial pathway produced a simi¬ 
lar transition pathway and free energy profile (Fig. A3). 

1. Pathway in the Absence of TR 

The CC^AB calculation obtains that the symmetric 
CC state is favored over the AB state by a free energy of 
« 3 &bT, and there is one on-pathway metastable state. 
The string pathway indicates the following order of events 
for a CC to AB transition with each number correspond¬ 
ing to the snapshots in Fig. 3a,c: The CC loop bends 
inward, straining the native backbone hydrogen bonds (I 
—> II) and eventually breaking them (III), beginning with 
the bonds closest to the core of the protein (and Trp82). 
After all of the native hydrogen bonds are broken, the 
FG loop opens becoming partially solvated and Trp82 
leaves its hydrophobic pocket resulting in a metastable 
state (IV) with a free energy difference of about 12 &bT 
compared to the native CC structure. The FG loop must 
now widen to accommodate further rotation of Trp82 (V) 
paying a ~ 5 &bT free energy penalty before collapsing 
and rearranging into the final AB substate (VI). 

2. Pathway in the Presence of RNA 

Complexation of CP 2 with the RNA stem loop TR 
dramatically shifts the free energy landscape, causing 
the AB:TR substate to be favored over the CC:TR sub¬ 
state by « 20 /cbT. The transition pathway is also 
markedly different from CC^AB, and now involves two 
on-pathway metastable states (Fig. 3b,d). In the con¬ 
verged string, the transition from CC:TR to AB:TR pro¬ 
ceeds by the following sequence of events with each num¬ 
ber corresponding to the snapshots in Fig. 3b,d: The first 
two backbone hydrogen bonds near the base of the FG 



6 






CC:TR 




vi 



FIG. 3. The most probable transition pathways and associated free energy profiles for CC^AB and CC:TR^AB:TR. (a), 
(b) The free energy along the most probable pathway as a function of arc length a along the converged strings, (c), (d) Close 
up snapshots of the B* FG loop along the transition pathway for CC^AB (c) and CC:TR^AB:TR (d). The native backbone 
hydrogen bonds of the CC monomer are shown in pink, and side chains with atoms selected as string collective variables (CVs) 
are shown as bonds. The labels correspond to the position along the free energy profile as indicated in (a) and (b). 


loop break (i —>• ii). The backbone dihedral angles of 
amino acids 79-81 move toward their eventual position 
in the <a-kink of AB:TR. This represents the first barrier 
to the transition, of ~ 7 &b T. It is now free energeti¬ 
cally favorable for Trp82 to rotate out of the hydrophobic 
pocket toward chain A, which forms the first metastable 
state (iii). Interestingly, this rotation proceeds in the op¬ 
posite direction as found in the CCv^AB transition. A 
free energy penalty of ~ 7 &bT must be paid to reach 
state iv, which involves side chain rearrangements and 
further solvation of the FG loop. Finally, Trp82 rotates 
into the FG loop, which then spontaneously collapses, 
resulting in the second metastable state (v). This state 
is structurally very similar to the final AB:TR state (vi), 
and only « 2 &bT higher in free energy. A final rotation 
of Trp82, involving a ~ 4 &b T barrier, leads to the final 
AB:TR state (vi). 


3. Comparison of Pathways 


The most striking difference between the ABv^CC and 
AB:TR^CC:TR strings is the shift in the most stable 
sub-state upon TR binding, from CC to AB:TR. This 
population shift is consistent with experimental data 12 . 
Both pathways highlight the important role of the large 
side chain of Trp82 in determining the sequence of events 
during the conformation change. The strings show that 
the native CC backbone hydrogen bonds require much 
more substantial molecular rearrangements before break¬ 
ing and allowing rotation of Trp82 in the CCv^AB tran¬ 
sition as compared to CC:TRv±AB:TR. This difference 
suggests that the binding of TR destabilizes these hydro¬ 
gen bonds, which would contribute to shifting the popu¬ 
lation toward the AB:TR state. 
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B. Root of Mean Squared Fluctuations 

The residue based RMSFs were calculated for the four 
dimer systems using microseconds of unbiased MD sim¬ 
ulations for each system as described in Sec. HE. The 
RMSF values give the typical fluctuations for each amino 
acid and provide insight into the relative flexibility of dif¬ 
ferent portions of the protein. These results are summa¬ 
rized in Fig. 4. 

Upon TR binding to the CC dimer, the largest change 
in the RMSF occurs in residues 23-30 of the CD loop, 
where the fluctuations decrease similarly in the A* and 
B* monomers (Fig. 4a). Similarly, the loops in both 
the A* and B* monomers of the RNA bound systems 
have lower RMSF than those in the non-bound systems 
(Fig. 4d,e). The CD loop in the CC dimer has higher 
fluctuations than in any other system. The effect of TR 
binding on the CD loop can be explained by noting that 
residues Asn27 and Val29 are in direct contact with TR. 

While the CC dimer is symmetric, it is possible that 
the transition in the B* FG loop to the AB conformation 
is due in part to asymmetries that arise in dimer fluctu¬ 
ations upon TR binding. The TR induced asymmetry in 
the fluctuations of the CC:TR system is evident in the 
RMSF of residues 49 to 53 leading up to and including 
part of the EF loop, which have higher RMSF in the A* 
monomer than in the B* monomer (Fig. 4a). This asym¬ 
metry is consistent with what was found in a previous all 
atom normal mode analysis by Dykeman et al. 13 , which 
showed that for WT MS2 the B factor, which is directly 
related to the RMSF, of the EF loop decreased in B* 
and increased in A*. The RNA binding to the AB dimer 
has the same effect on the EF loop of the B* monomer 
(Fig. 4c). This effect on the EF loop can be explained by 
the fact that this part of the protein in the B* monomer 
is in direct contact with the TR. While Dykeman et al. 
also found that the B factor of the FG loop in B* in¬ 
creased upon TR binding to the CC dimer, we find the 
dynamics of the FG loops of the CC:TR system to be 
similar to the symmetric dimer. However, the RMSF of 
residues Trp82, Arg83, Tyr85 and Leu86 following the 
FG loop are suppressed in the A* monomer (Fig. 4a). 

A significant difference between the CC:TR and 
AB:TR systems is in the FG loop, which has higher 
RMSF in the AB:TR system for both the A* and B* 
monomers (Fig. 4b,c). Another significant difference be¬ 
tween the AB:TR and other systems is in the GH-loop 
of the B* monomers, where residues 95-100 have higher 
RMSF in B* monomer of AB:TR 4c,f. The final effect 
to note is that TR binding in the AB system leads to a 
dramatic decrease in the dynamic fluctuations of the FG 
loop of the B* monomer, where in the AB:TR system the 
dynamics of the loop is much more confined (Fig. 4f). It 
is important to note that these differences in fluctuations 
in the B* monomer take place far from the TR binding 
residues and are a result of long range allosteric commu¬ 
nications. 

To validate that the sampling is sufficient for this anal¬ 


ysis we present the RMSF of the EF and FG loops of the 
CC:TR dimer using half of the total trajectory frames 
in the Appendix C, and show that the results are the 
same as those using all the frames. In the next section 
we look at how the changes in the dynamics of TR bind¬ 
ing residues are communicated to the rest of the protein, 
including the FG loop. 


C. Mutual Information 

Cluster analysis to determine groups of corre¬ 
lated residues. To characterize residue conformational 
correlations, we also calculated the mutual information 
between the dihedral angles of all pairs of amino acids 
from the long unbiased trajectories. We then clustered 
residues based on their pairwise mutual information to 
identify groups of residues that are strongly correlated. 
The intra-cluster averages used as cutoffs for each sys¬ 
tem were: 0.035/c B T for CC, 0Mk B T for AB, 0.07 k B T 
for CC:TR and 0.1k B T for AB:TR. For further discus¬ 
sion on the cutoffs and the resulting number of clusters 
see Appendix D. To assess sampling convergence, we 
compared the raw MI data and calculated clusters from 
all 7 trajectories (2.576/is) to results obtained using only 
4 trajectories (1.472/rs). The Pearson’s correlation co¬ 
efficient for the MI calculated from the full and partial 
data sets is 0.896, and clusters obtained from the partial 
data set (Appendix D) are consistent with those from the 
larger data set. 

The clusters from the full data set are presented in 
Fig. 5 for each of the dimer systems. In the CC and AB 
dimers, there is one large cluster that encompasses the 
majority of residues on the TR binding face of the protein 
(blue in Fig. 5), including both the A* and B* FG loops. 
Small clusters of residues form at the opposite face of the 
protein, each primarily containing few residues from the 
helical domains. 

TR binding changes the clustering, particularly of the 
FG loops. In CC:TR both the A* and B* FG loops are 
clustered separately from the main body of the protein 
and regions in direct contact with TR. Trp82, which was 
found by the string calculations to be crucial for the con¬ 
formational transition, is also clustered separately from 
the main protein body in both A* and B* conformations. 
This implies that the B* FG loop is relatively indepen¬ 
dent from the TR-bound residues in CC:TR. However, 
in AB:TR the B* FG loop is clustered with the main 
body of the protein that is in contact with TR. Hence, 
the conformation of the B* FG loop in AB:TR is mod¬ 
ulated by the bound TR. Recall that the RMSF results 
showed that the B* FG loop in AB:TR is much more 
confined than in the AB system. These two results to¬ 
gether strongly suggest that the B* FG loop’s dynamics 
is modulated by the TR in the AB:TR conformation. 
The AB:TR system also stands out as it has higher total 
MI in the system (Fig. 6a) compared to the other three 
dimers. Hence, not only does the clustering and RMSF 



B* Monomer 


(a) (b) 




FIG. 4. The RMSF for each residue (averaged over all non-hydrogen atoms within a residue) for each CP 2 and CP 2 :TR 
system, (a)-(f) The RMSF as a function of residue number with the important loops and turns labeled in (a). To facilitate 
comparison of the dynamics between different systems, each plot shows as a reference the RMSF for the CC dimer, averaged 
over the two symmetric monomers, as well as the two additional systems, (g) A view of the CC dimer with the loops and turns 
labeled according to convention 70 . The residues Val29, Thr45, Ser47, Thr59, Lys61 which form the TR binding pocket on each 
monomer are shown in red. 



FIG. 5. Mutual information clusters for all the CP 2 and 
CP 2 MR systems. The clusters of correlated residues, cal¬ 
culated from the mutual information between pairs of amino 
acids as described in the text, are shown in different colors. 


data show that TR binding plays a key role in modulating 
the B* FG loop through residue correlations in AB:TR, 
but also that the overall intra-protein communication is 
stronger in AB:TR than in the other three systems. 

Betweenness centrality identifies communica¬ 
tion pathways. To gain a molecular-scale understand¬ 
ing of how residue conformational information is trans¬ 
ferred across the protein, we filtered the complete mu¬ 
tual informational data to include only pairs of residues 
in direct contact, defined as residues that are 5.5A or 
closer for 75% of the simulation time. We then used 
this contact-filtered MI graph to calculate the between¬ 
ness centrality for each residue as discussed in Sec. IIF. 
Betweenness centrality represents the number of short¬ 


est paths between all node pairs on the MI graph that 
pass through a given residue, and is thus a measure 
of how important each residue is for information flow 
through the network. The betweenness centrality has 
been used in previous work to determine the importance 
of nodes in networks constructed from residue-residue in¬ 
teraction energies 71 . Here, the network is constructed 
using residue-residue correlations. In Fig. 6b we show 
the 11 residues with the highest betweenness centrality 
(> 10 4 ) for the AB:TR system. We find that there is 
a group of consecutive residues with high centrality on 
the B* G /3-strand (Tyr85, Leu86, Asn87 and Met88), 
including Tyr85 which has highest centrality in the sys¬ 
tem (Fig. 6c). Two other residues with high centrality are 
Val64 on the B* F /3-strand, with the second highest cen¬ 
trality, and Lys66 in the B* FG loop. These results thus 
suggest a communication network in the protein in which 
the majority of communication travels through the spine 
of four residues on the B* G /3-strand and is then coupled 
to the B* F /3-strand and FG loop. Perturbations due to 
TR binding passed along this pathway are thus strongly 
coupled to the FG loop conformation. In contrast, the 
same group of residues does not exhibit high centrality in 
the A* monomer, providing an explanation for why TR 
binding does not affect the A* FG loop conformation. 

Considering the residues with high centrality (> 10 4 ) 
in the other three dimer systems gives further insight 
into how intra-protein communication is modulated by 
TR binding (Fig. 7). For the CC dimer, there is a group 
of 3 residues (Glu63, Tyr85, and Leu86), that appear 
in both the A* and B* monomers and are outlined in 
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FIG. 6. Total MI of each CP 2 and CP 2 :TR system and the 
betweenness centralities of the AB:TR residues, (a) The to¬ 
tal MI of each system is compared. Calculation of the total 
MI of the CC system for two different sampling sizes gives 
an estimate of the spread in total MI values due to sam¬ 
pling size variations. The AB:TR system is found to have 
stronger correlations than the other three systems, (b) The 
betweenness centralities of each residue in the AB:TR sys¬ 
tem. Residue numbering starts at the A* monomer and ends 
at the B* monomer, (c) The residues with betweenness cen¬ 
tralities greater than 10 4 are represented on a structure of 
the AB:TR system in stick representation. The color scale 
corresponds to the relative betweenness /3, which is the ra¬ 
tio of the given residue’s betweenness centrality divided by 
the maximum residue betweenness centrality in the system. 
The residues with betweenness centralities greater than 10 4 
are Lys57, Glu63, Arg83, and Leu86 in the A* monomer and 
Val64, Lys66, Tyr85, Leu86, Asn87, Met88, and Metl08 in 
the B* monomer. 


Fig. 7. These residues highlight the symmetry that ex¬ 
ists in the CC dimer. However, the highest centrality 
residues are not completely symmetric between the A* 
and B* monomer. For example, residue Met88 appears in 
the B* monomer but not in the A* monomer. We expect 
these asymmetries to be resolved with greater sampling, 
but this result also highlights the fact that the instanta¬ 
neous configurations of the CC dimer are not perfectly 
symmetric due to fluctuations. The CC:TR results show 
that TR binding breaks the symmetry between the A* 
and B* monomers present in the CC dimer, since the 
high centrality residues present in the B* monomer are 
not present in the A* monomer. However, there is a 
similarity between the B* monomers in CC and CC:TR, 
as residues Tyr85, Leu86 and Met88 in the G /3-strand 
appear in both dimers. Tyr85, Leu86 and Met88 also 
appear in both the A* and B* monomers of the AB 
dimer and the B* monomer of AB:TR. Hence, these 
three residues are important for communication in the 
B* monomer in all of the dimer systems. 



FIG. 7. The relative betweenness /3, as described in Fig. 6, 
shown for residues with a betweenness centrality greater 
than 10 4 for the CC, CC:TR and AB systems. The three 
residues Glu63, Tyr85 and Leu86 that appear on both the A* 
and B* monomers of the symmetric CC dimer are outlined. 
Tyr85 which appears as a high betweenness residue in the B* 
monomer of all of the CP 2 and CP 2 :TR systems is marked on 
the CC:TR structure. 


IV. DISCUSSION AND CONCLUSIONS 

We have combined the string method, free energy cal¬ 
culations, and analysis of long unbiased molecular dy¬ 
namics simulations to characterize the effect of binding of 
the MS2 genome fragment TR to its capsid protein. The 
calculations demonstrate that the impact of TR binding 
is substantial and far-reaching. The free energy profiles 
calculated from our converged strings for the CCf^AB 
and CC:TR^AB:TR transitions (Fig. 3) show a strong 
shift in the favored population from CC to AB:TR. Fur¬ 
thermore, the strings indicate that TR binding dramati¬ 
cally alters the interconversion pathway, changing the se¬ 
quence of events and the nature and number of intermedi¬ 
ate metastable states. Given that TR binds more than a 
nanometer from the the residues which undergo the ma¬ 
jority of conformational rearrangement (the FG loop), 
our calculations provide direct evidence for allostery and 
begin to reveal its underlying mechanisms, albeit within 
the limitations of force field accuracy and finite sampling. 

The fundamental effect of TR-binding is to generate an 
inherently asymmetric dimer. The CC—)>AB transition 
requires a spontaneous fluctuation that breaks the CC 
symmetry and ‘chooses’ which FG loop will interconvert 
to a B conformation. In contrast, TR-binding introduces 
subunit-spanning asymmetries that favor transition of 
one chain. We characterized these asymmetries, and how 
they are transmitted across the protein, by analyzing col¬ 
lective motions and correlated conformational statistics 
of amino acids within long unbiased MD trajectories of 
each stable substate. We found extensive asymmetries in 
both the dynamical fluctuations and correlations. The 
most significant effect of TR binding was found to be on 
the FG loop of the B* monomer when comparing the AB 
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and AB:TR conformations. We find a pathway of strong 
communication along a spine of residues between the TR 
binding region and the B* FG loop, thus identifying how 
the conformational landscape can be so strongly shifted 
upon TR binding to stabilize the AB conformation. 

Comparison to previous results. Previous exper¬ 
iments on MS2 have shown that TR binding induces a 
conformation change from a symmetric to an asymmetric 
structure 12 . Based on this and other evidence it has been 
inferred that the CC state is preferred in the absence of 
TR, and the AB state in the presence of TR. Our results 
from the string method calculation and the associated 
free energy profile directly support this conclusion, and 
also reveal that the associated transition pathways differ 
in the presence of TR. 

A study by Dykeman et al 13 performed an all-atom 
normal mode analysis to determine how the vibrational 
modes are modified by RNA binding. They found that 
TR binding to a (WT) CC conformation causes asym¬ 
metric fluctuations of the EF loop; fluctuations increase 
in A* and decrease in B*. The mutant Trp82Arg has 
an asymmetry in the DE loop instead, which was pro¬ 
posed as a possible explanation for why it is assembly- 
incompetent. Our MSF calculations also find asymme¬ 
tries upon TR binding in the EF loop. While there is 
little difference between the FG loop fluctuations in the 
CC and CC:TR systems, TR binding in the AB systems 
shows the fluctuations in the B* FG loop greatly de¬ 
creased upon TR binding. 

Limitations of our calculations and outlook. The 

relevance of the string method pathway and associated 
free energy profile depend on the extent to which the col¬ 
lective variables describe all relevant slow degrees of free¬ 
dom. Furthermore, recent computational studies have 
shown that conformational transitions can proceed by 
multiple, diverse pathways (e.g. 64 ), while a single string 
calculation typically samples only one transition tube. 
We have assessed several metrics to determine whether 
our set of collective variables was sufficient and the extent 
of sampling within trajectory space: (1) Convergence 
during string iterations was reasonably rapid. Missing 
slow degrees of freedom can be expected to slow con¬ 
vergence since they relax on a slow time scale. (2) The 
umbrella sampling calculations were performed in both 
directions along the pathway. A lack of significant hys¬ 
teresis between these two directions is consistent with 
inclusion of all relevant degrees of freedom. (3) Indepen¬ 
dent string calculations started from substantially differ¬ 
ent initial pathways led to very similar converged strings 
(Fig. A3), consistent with efficient sampling in trajectory 
space. Taken together, these results are consistent with 
a sufficient set of collective variables and broad sampling 
within trajectory space. Further testing of the validity of 
these calculations could be achieved by comparing the 
results to those of an independent technique, such as 
Markov State model (MSM) calculations. It is also pos¬ 
sible to use the converged string as a starting point for 
efficient construction of an MSM 64 . 


While we analyzed correlations of amino acid confor¬ 
mations within the stable conformational substates, fur¬ 
ther insight into the transition mechanism might be ob¬ 
tained by characterizing mutual information during the 
conformational transition. Finally, in this work, we fo¬ 
cused on the effects of TR binding on the coat protein 
dimer conformations. A natural next step is to examine 
the effect of TR binding on dimer-dimer interactions; e.g. 
Ref. 14 . 
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Appendix A: Additional methods details 
1. Equilibration 

Each of the 4 systems was relaxed from its initial con¬ 
figuration as follows. First, the system was minimized 
while iteratively relaxing harmonic restraints on all pro¬ 
tein heavy atoms, centered on crystal structure positions. 
Next, MD simulations were performed in which the same 
restraints, now centered on the final minimized position, 
were slowly relaxed as the temperature was gradually in¬ 
creased from 25K to 300K. Unbiased MD was then per¬ 
formed for 50-100ns to ensure equilibration. To prevent 
self-interaction, rotational drift was limited by harmonic 
restraints on the <a-carbons of residues 105-109 in the A 
subunit a-helix, which is located on the top of the dimer 
far from the FG-loop and RNA binding sites. 

Of the four systems, only the AB dimer undergoes sig¬ 
nificant rearrangement. Trp82 in the FG loop rotated 
and the rest of the FG loop moved toward the EF hair¬ 
pin of the A monomer; in contrast, the AB:TR FG loop 
remained close to the DE loop of the B monomer. The 
difference between equilibrated AB and AB:TR states is 
significant because Trp82 is a large side chain which rear¬ 
ranges substantially during the conformational changes. 


Appendix B: String method calculations 

This section presents details on the string method 
calculation. Following Ovchinnikov et al. 6 , we de¬ 
fine 7V cv collective variables that depend on the Carte¬ 
sian positions x of atoms in the protein as 0(x) = 
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^i(x), §2 (x) ,..., ^n cv (x)^ . Each image n of the string 
evolves according to 6 

0 n (t + A t) = G n (t) - 7 - 1 AtM(0 n (t))VG(0 n (t)) (Bl) 


3. Reparameterize. Redefine the locations of im¬ 
ages along the string so that they are uniformly 
spaced in arc length, following the implementation 
of Maragliano et al. 5 . 


where 0 n (t) gives the collective variable values of image 
n from string iteration t and 7 is a tuneable “friction 
constant” that sets the size of the step taken down the 
free energy gradient (along with At). The metric ten¬ 
sor M(0(t)) accounts for the curvilinear nature of the 
collective variables and is given by 6 


M ij {e) = Y J 

k 


1 

m k 


dOj (x) dOj(x.) \ 
dx k dx k ] Hx)=e 


(B2) 


where the sum ranges over each coordinate k for all atoms 
in the system, (...) denotes an average over sampling 
constrained in the vicinity of 0, and m k is the mass of 
atom k. 

We made two simplifying approximations in our im¬ 
plementation. Since we used only Cartesian coordi¬ 
nates for collective variables, we approximated the tensor 
M{0 n (t )) in Eq. (Bl) as the identity matrix. Tests with 
and without this approximation supported that the met¬ 
ric tensor can be neglected for our system. 

The second approximation was to dynamically set 
7 -1 At (from Eq. (Bl)) such that the step size is a fixed 
fraction of the image spacing. This guarantees that 
new images will not jump too far in any given itera¬ 
tion. With the alanine dipeptide model system, we ex¬ 
tensively tested our implementation with both approxi¬ 
mations against a string implementation with collective 
variables based on dihedral angles. 

To identify collective variables sufficient to describe the 
transition between states, we systematically vetted can¬ 
didate coordinates using restrained targeted molecular 
dynamics simulations 6 (described for our systems in Ap¬ 
pendix IIC). Next, we used TMD to generate an initial 
string connecting the two metastable states. This path¬ 
way was then discretized into images, and the string was 
systematically relaxed by the following iterative proce¬ 
dure. 


1. Sample. For each image n, run short simulations 
to estimate VG(0 n ) (the free energy gradient in 
collective variable space in the proximity of image 
n). In each short simulation, impose a harmonic 
potential for each collective variable, centered on 
the image. The spring constant of the harmonic 
potential is selected to keep sampling in the vicin¬ 
ity of its image (typically with an average sampling 
radius of 1-2 image spacings). Calculate the aver¬ 
age force imposed by each potential. 

2 . Evolve. Generate a new string by displacing each 
image a distance S in the direction opposite to the 
free energy gradient. In our implementation, S is 
scaled to be a fixed fraction of the image spacing. 


This procedure was iterated until the string path¬ 
way approximately converged, which was assessed by the 
RMSD between the initial and current strings. We define 
the RMSD between strings as 

/ rl \ V 2 

RMSD( 6 » 5 l , 6 » 52 ) = I / \d Sl (a)-6 S2 (a)\ 2 da j 

(B3) 

with G Si (a) as the A^ cv -dimensional point at fraction a 
along string S{. Since the strings are discretized, it is 
necessary to interpolate between images. 


1. Generating the Initial String 

Initial strings were generated from TMD trajectories 
of the CC—?>AB and CC:TR-^AB:TR transitions, with 
the TMD bias based only on CV atoms. Coordinates 
were saved every 2 ps and used to construct a time se¬ 
ries of CV values, which was then smoothed to prevent 
noise from dominating image selection. The data was 
smoothed by applying a nearest neighbor smoothing ker¬ 
nel to the coordinates for 10-20 iterations. Forty im¬ 
ages, with approximately equal spacing, were then se¬ 
lected from the smoothed trajectory and used as the 
initial string pathway. The spacing between the N cv - 
dimensional images in the initial string was 3.5A for CP 2 
and 2.5A for CP 2 :TR, which provided sufficient resolu¬ 
tion to capture bond-breaking and all significant confor¬ 
mational rearrangements. 

TMD parameters. During selection of collective 
variables and generation of initial pathways, TMD simu¬ 
lations imposed a harmonic potential as a function of the 
RMSD difference between the current and target struc¬ 
ture, measured from the positions of the candidate CV 
atoms only. The center of the potential was moved lin¬ 
early from the RMSD of the initial configuration to 0 
over 1.5 ns. The spring constant was linearly scaled from 
k = 2.5 x 10 6 kJ/mol • nm 2 to k = 5 x 10 6 kJ/mol • nm 2 
over this same interval. After centering on RMSD= 0, 
k was linearly increased over three separate 500ps inter¬ 
vals to k = (2,20,200) x 10 7 kJ/mol • nm 2 . After this, k 
was linearly decreased to 0 over Ins, followed by 4ns of 
unbiased simulation. 


2. Running the String 

Each string was evolved according to the three steps 
outlined at the beginning of Appendix B: sample, evolve, 
reparameterize. In the sample step, for each image n, 
the structure from the previous (or initial) string clos¬ 
est in CV space to the image CV values 0 n was sub- 
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(a) 


MS2 w/o RNA 


(b) 


MS2 w/RNA 



FIG. Al. The RMSD during the equilibration of the (a) CP 2 
and (b) CP 2 :TR strings. The RMSD is calculated according 
to Eq. (B3) with respect to both the initial string and the 
previous string iteration. 
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FIG. A2. (a) The string RMSD as calculated from Eq. (B3) 
during the convergence of a CP 2 string. The strings at itera¬ 
tions 30 and 63 were taken for the free energy calculation (as 
marked by the stars), (b) The free energy profile for each of 
the two converged CP 2 strings as a function of arc length a. 


jected to a steered molecular dynamics (SMD) simu¬ 
lation targeting 0 n . A harmonic potential with force 
constant fcdmg = 10 5 kJ/nm 2 was imposed for each CV, 
and moved linearly to 0 n (at a speed of no faster than 
0.1nm/(1000steps)). Then, we sampled the local free en¬ 
ergy gradient by performing MD for an additional 200ps 
with harmonic restraints for the CV centered at 6 n . To 
maintain local sampling while speeding convergence, we 
chose a restraint force constant of &hoid = 450.0kJ/nm 2 , 
yielding an average sampling radius of 1-2 image spac- 
ings. CV values were recorded every O.lps. The string 
was then evolved by updating CV values according to 
Eq. Bl, with a step size set to a fixed fraction S = 0.5 
of the image spacing, followed by reparameterization to 
maintain uniform spacing along the arc length. 

To monitor string convergence, we calculated the 
RMSD (in CV space) between points at equal arc length 
along the string according to Eq. B3. We used a linear in¬ 
terpolation between neighboring images to calculate the 
RMSD at arc lengths not commensurate with image loca¬ 
tions. Each string was run until the RMSD with respect 
to the initial string plateaued, which required 50-100 it¬ 
erations (Fig. Al). 


3. String Convergence and Validity 

To test the assumption that a plateau of the RMSD 
in CV space is a good measure of string convergence, we 
calculated the CC^AB free energy profile for two string 
iterations after the RMSD plateau (Fig. A2). Although 
the two free energy profiles are not identical, they obtain 
the same free energy difference between CC and AB sub¬ 
states, nearly the same barrier height, and both have a 
single on-pathway metastable state. 

To assess global convergence of the string, we 
performed a second string calculation for the 
CC:TRv^AB:TR transition, initialized from a dif¬ 
ferent TMD simulation. This TMD used a slightly 
different definition for the CC:TR and AB:TR substates, 
and produced an initial pathway which differs substan¬ 
tially from the initial pathway used for the first string. 
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string calculations 


Free Energy for two 
CP 2 :TR strings 



• string 1 
■ string 2 
5|c selected for free 
energy calculation 

°0 20 40 60 80 100 120 140 

iteration 



FIG. A3, (a) The RMSD with respect to the initial string 
(Eq. B3) during the convergence of CP 2 :TR strings initialized 
from two independent TMDs. (b) The free energy profiles for 
the two final CP 2 :TR strings as a function of arc length a. 


The RMSD of 13 A between the two initial pathways 
is approximately as large as the RMSD between the 
first converged string and its initial pathway. The 
convergence and resulting free energy profiles are shown 
in Fig. A3. Once again, the two strings result in the 
same relative free energies for the CC:TR and AB:TR 
substates and contain the same number (two) of on- 
pathway metastable states. While there are quantitative 
differences, the overall similarity between the two 
calculations suggests that the strings have converged 
to the same pathway. This result from two different 
initial pathways is consistent with a global MFTP, 
although a thorough assessment would require a number 
of additional strings and hence a large computational 
cost. 


Appendix C: RMSF 

To validate that the sampling used is sufficient for the 
RMSF calculations we compare the per residue RMSF 
for the EF and FG loop regions for the CC:TR system 
using the full data set of 2.576/rs and using half the total 
trajectory frames. These results are presented in Fig. A4 
and are marked “full” and “half”. The RMSF values 
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FIG. A4. The per residue RMSFs of the EF and FG loop 
regions of the A* and B* monomers of the CC:TR dimer. The 
results using the full simulation data and half the simulation 
data are compared and show that the results obtained using 
the full simulation data are converged. 


obtained from using the full data set are the same as 
what was presented in Fig. 4a. The results show that 
the two data sets give very similar results and that the 
calculated RMSF values are converged. 
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FIG. A5. The similarity matrix for the mutual information 
matrix clusters for each CP 2 and CP 2 :TR system. The col¬ 
oring indicates the average mutual information and is scaled 
uniformly for all plots (as shown on the right in units of k^T). 
The last cluster in each plot contains amino acids that do not 
share MI with each other or other clusters, which is why the 
last diagonal element has a value near 0. 


Appendix D: Mutual Information Calculations 

To determine groups of amino acids that have corre¬ 
lated distributions, we used hierarchical clustering on the 
MI matrices. In hierarchical clustering, each amino acid 
starts in its own cluster, and clusters with minimal “dis¬ 
similarity” are recursively merged until only one cluster 
remains. For our calculation, the “dissimilarity” was de¬ 
termined by the intra-cluster average of Dij of Eq. (2). 
From the resulting hierarchy of clusters, we systemat¬ 
ically extracted the largest possible clusters, such that 
the intra-cluster MI average was greater than a certain 
cutoff value. Hierarchical clustering was applied to each 
of the four MI data sets. An intra-cluster average cutoff 
of 0.035 /cb^ was used for the CC system, which results 
in 7 distinct clusters. Increasing the cutoff to 0.04 /cb T 
lead to a jump in the number of clusters to 12, hence 
we decided to base our analysis on the smaller 7-cluster 
result. A similar approach was used for the other three 
dimer systems, where a cutoff was found to generate a 
reasonable number of clusters with low inter-cluster MI 
averages, but stayed below an intra-cluster average cut¬ 
off that would lead to a jump in the number of clusters. 
Hence, an intra-cluster average cutoff of 0.04 /cb T was 
used for the AB system, which results in 5 distinct clus¬ 
ters. Increasing the cutoff to 0.05 /cbT 1 lead to an increase 
in the number of clusters to 10. For the CC:TR sys¬ 


tem an intra-cluster average cutoff of 0.07 /cbT" was used, 
resulting in 8 distinct clusters. Increasing the cutoff to 
0.08 /cbT lead to an increase in the number of clusters to 
14. For the AB:TR system an intra-cluster average cut¬ 
off of 0.10 &bT was used, resulting in 5 distinct clusters. 
Increasing the cutoff to 0.11 &bT lead to an increase in 
the number of clusters to 10. 

The resulting clusters presented in Fig. 5 were tested 
to ensure that they have a high intra-cluster correlation 
average and a low inter-cluster correlation average (as 
calculated from Eq. 3). The resulting correlations are 
shown in Fig. A5, where the diagonal shows strong intra¬ 
cluster correlations. 

The MI matrix for the CC dimer was clustered us¬ 
ing all of the 7 trajectories (2.576/is) as shown in Fig. 5 
and using 4 trajectories (1.472/is). The resulting clusters 
for both sampled MI data sets for CC are compared in 
Fig. A6. Using an intra-cluster average cutoff of 0.05 /cbT 
lead to 8 distinct clusters. The cluster identifications are 
similar in for the 2.576/is and 1.472/is sampled systems. 
Importantly, the FG loops in the A* and B* monomers 
are clustered with the large blue cluster that includes the 
TR binding sites. Hence, we conclude that the sampling 
is converged for the calculation of the MI clusters. 
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FIG. A6. The clusters of the MI matrix for the CC system 
using the 2.576/xs and 1.472/xs of sampling. The similarity in 
the resulting clusters indicates that the sampling is converged. 
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